Semi-Supervised Prediction-Constrained Topic Models
نویسندگان
چکیده
Supervisory signals can help topic models discover low-dimensional data representations which are useful for a specific prediction task. We propose a framework for training supervised latent Dirichlet allocation that balances two goals: faithful generative explanations of high-dimensional data and accurate prediction of associated class labels. Existing approaches fail to balance these goals by not properly handling a fundamental asymmetry: the intended application is always predicting labels from data, not data from labels. Our new prediction-constrained objective for training generative models coherently integrates supervisory signals even when only a small fraction of training examples are labeled. We demonstrate improved prediction quality compared to previous supervised topic models, achieving results competitive with highdimensional logistic regression on text analysis and electronic health records tasks while simultaneously learning interpretable topics.
منابع مشابه
Supplement: Semi-Supervised Prediction-Constrained Topic Models
This document contains supplementary material to the AISTATS 2018 accepted paper “Semi-Supervised Prediction-Constrained Topic Models.”
متن کاملPrediction-Constrained Training for Semi-Supervised Mixture and Topic Models
Supervisory signals have the potential to make low-dimensional data representations, like those learned by mixture and topic models, more interpretable and useful. We propose a framework for training latent variable models that explicitly balances two goals: recovery of faithful generative explanations of high-dimensional data, and accurate prediction of associated semantic labels. Existing app...
متن کاملMedLDA: maximum margin supervised topic models
A supervised topic model can use side information such as ratings or labels associated with documents or images to discover more predictive low dimensional topical representations of the data. However, existing supervised topic models predominantly employ likelihood-driven objective functions for learning and inference, leaving the popular and potentially powerful max-margin principle unexploit...
متن کاملPrediction-Constrained Topic Models for Antidepressant Recommendation
Supervisory signals can help topic models discover low-dimensional data representations that are more interpretable for clinical tasks. We propose a framework for training supervised latent Dirichlet allocation that balances two goals: faithful generative explanations of high-dimensional data and accurate prediction of associated class labels. Existing approaches fail to balance these goals by ...
متن کاملHigh-Performance Semi-Supervised Learning using Discriminatively Constrained Generative Models
We develop a semi-supervised learning method that constrains the posterior distribution of latent variables under a generative model to satisfy a rich set of feature expectation constraints estimated with labeled data. This approach encourages the generative model to discover latent structure that is relevant to a prediction task. We estimate parameters with a coordinate ascent algorithm, one s...
متن کامل